# @nayibbukele Twitter/X Corpus (2012-2026)

## Dataset Overview

This dataset contains tweet metadata published by Nayib Bukele (@nayibbukele) 
on the Twitter/X platform, collected for academic research on democratic erosion 
in Latin America.

---

## Files Included

| File | Description |
|------|-------------|
| tweets_bukele_ids_publico.csv | Public dataset without tweet text |
| codebook_bukele.docx | Full variable descriptions and methodology |
| scraper.py | Python module for tweet collection via Playwright |
| SCRAP_X_BUKELE_FINAL.R | R orchestration script |

---

## Dataset Structure (tweets_bukele_ids_publico.csv)

| Variable | Type | Description |
|----------|------|-------------|
| id | character | Unique tweet ID (Snowflake format — treat as text, not number) |
| fecha | character | Publication timestamp (ISO 8601, UTC) |
| likes | integer | Like count at time of collection |
| retweets | integer | Retweet count at time of collection |
| replies | integer | Reply count at time of collection |
| es_respuesta | logical | TRUE if tweet is a reply to another user |
| periodo | character | Political period classification |
| fecha_local | datetime | Timestamp in El Salvador local time (UTC-6) |
| tipo_tweet | character | Tweet type: original or respuesta |

NOTE: Tweet text (texto) and URL (url) columns are excluded from this 
public release in accordance with Twitter/X Developer Policy. 
See "Reconstructing Tweet Text" below.

---

## Political Period Classification

| Code | Label | Date Range | Context |
|------|-------|------------|---------|
| P0_prepresidencia | Pre-presidency | Jan 8, 2012 - May 31, 2019 | Mayor of Nuevo Cuscatlan (2012-2015), Mayor of San Salvador (2015-2018), presidential campaign (2019) |
| P1_presidencia1 | Presidency 1 | Jun 1, 2019 - May 31, 2024 | First presidential term. Includes military incursion into Congress (Feb 2021), Bitcoin Law (Jun 2021), State of Exception (Mar 2022-present), re-election bid (2023-24) |
| P2_presidencia2 | Presidency 2 | Jun 1, 2024 - present | Second presidential term following constitutionally contested re-election |

---

## Coverage Summary

| Period | Tweets | % of Total |
|--------|--------|------------|
| P0 Pre-presidency | 7,096 | 43.2% |
| P1 Presidency 1 | 8,775 | 53.4% |
| P2 Presidency 2 | 559 | 3.4% |
| TOTAL | 16,430 | 100% |

Content: Original tweets and replies only. Retweets excluded by design.
Languages: All languages (no filter applied).
Metrics: Captured at collection time (February-March 2026).

---

## Reconstructing Tweet Text (Hydration)

Tweet text can be reconstructed from the provided IDs using the Twitter/X API:

1. Create a Twitter/X Developer account at developer.twitter.com
2. Apply for API access (Basic tier or higher)
3. Use the GET /2/tweets endpoint with the tweet IDs from this dataset
4. Important: Tweets deleted after collection will not resolve (ID will return 404)

Example using Python (tweepy v5+):

```python
import tweepy
import pandas as pd

# Twitter/X API v2
client = tweepy.Client(bearer_token="YOUR_BEARER_TOKEN")

# Read IDs from CSV
df = pd.read_csv("tweets_bukele_ids_publico.csv")
ids = df['id'].tolist()

# Hydrate (max 100 IDs per request)
tweets = client.get_tweets(
    ids=ids[:100],
    tweet_fields=["text", "created_at", "author_id"]
)
```

Note: Twitter/X Developer Policy section 4.A permits academic redistribution 
of tweet IDs for non-commercial research purposes.

---

## Collection Methodology

Data was collected using a custom pipeline combining R (v4.5.2) and Python (v3.14) 
via the reticulate package. Browser automation was performed with Playwright 
(Python v1.58.0, Chromium) using a persistent session architecture.

Collection strategy:
  Phase 1: Daily sweep (one query per day, Jan 2012 to Mar 2026)
  Phase 2: Hourly re-query for days with >= 18 tweets (four 6-hour UTC windows)
  Phase 3: Verification of critical political event periods

Known limitation: Twitter/X free-tier accounts are subject to a ~20 result 
display cap per search query. Days with high tweet volume may be underrepresented 
despite the multi-phase strategy.

For complete methodology, see codebook_bukele.docx.

---

## Ethical and Legal Statement

This dataset was collected for non-commercial academic research purposes only.
All content consists of public tweets voluntarily published by a head of state.
No private accounts or protected content was accessed.

The scripts are published for methodological transparency and reproducibility.
Their use is subject to Twitter/X Terms of Service.

---

## Suggested Citation (APA 7th Edition)

[Author(s)]. (2026). @nayibbukele Twitter/X Corpus (2012-2026) [Dataset]. 
Harvard Dataverse. https://doi.org/[REPLACE WITH DOI AFTER PUBLICATION]

---

## License

Creative Commons Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/

---

## Related Dataset

@lopezobrador_ Twitter/X Corpus (2009-2026) — available separately on 
Harvard Dataverse. Both datasets share identical structure and were collected 
using the same methodology, enabling direct comparative analysis.
DOI: [to be added upon publication]
